Shotgun Metagenomic Data Analysis ◾ 321
We need to index the sorted BAM files using “samtools index” command.
for i in $(ls *.sorted);
do
samtools index -@ 4 ${i}
done
Then, we will use “samtools idxstats” to generate some statistics from the sorted BAM files.
samtools idxstats ERR1823587_healthy.bam.sorted > ERR1823587_
healthy_stat.txt
samtools idxstats ERR1823601_moderate.bam.sorted > ERR1823601_
moderate_stat.txt
samtools idxstats ERR1823608_severe.bam.sorted > ERR1823608_
severe_stat.txt
The output of the “samtools idxstats” command is a TAB-delimited file with each line con-
sisting of the reference sequence name, sequence length, number of mapped read-segments,
and number of unmapped read-segments. From those files, we can generate abundance
table similar to the OTU (operation taxonomic units) generated from clustering of the
amplicon-based reads in Chapter 7. For this purpose, we can use “get_count_table.py”
script, which can be cloned from GitHub using the following command:
git clone https://github.com/metajinomics/mapping_tools.git
Then, we can use that Python 2 script to generate an abundance table for each sample. So,
if you do not have Python 2 installed on your computer, you may need to install it.
python2 mapping_tools/get_count_table.py ERR1823587_healthy_stat.
txt > ERR1823587_healthy_count.txt
python2 mapping_tools/get_count_table.py ERR1823601_moderate_stat.
txt > ERR1823601_moderate_count.txt
python2 mapping_tools/get_count_table.py ERR1823608_severe_stat.
txt > ERR1823608_severe_count.txt
cd ..
We will use the output of this script for binning in the next step.
8.2.7 Binning
Above, we discussed binning as the process of separating the sequences into bins that
represent the most likely taxa. There are many programs that can do this job including
metabat2, CONCOCT, and MaxBin. Here, we will use metabat2 as an example. Metabat2
is easier to install on Anaconda or Miniconda.
conda install -c biconda metablat2
conda install -c bioconda/label/cf201901 metabat2